Lecture 12¶
Desargues Theorem¶
This is modified from Homework 4.
Recall that three points are colinear if they lie on a common line. A triple of points $\{a, b, c\}$ is a triangle if the points are not colinear.
Consider two triangles $\{a_0, b_0, c_0\}$ and $\{a_1, b_1, c_1\}$.
We say the triangles are in persective centrally if the three lines $\overline{a_0 a_1}$, $\overline{b_0 b_1}$, and $\overline{c_0 c_1}$ all pass through a common point called the center of perspectivity (these lines are coincident).
We say the triangles are in persective axially if the three points $\overline{a_0 b_0} \cap \overline{a_1 b_1}$, $\overline{b_0 c_0} \cap \overline{b_1 c_1}$, and $\overline{c_0 a_0} \cap \overline{c_1 a_1}$ all lie on the same line called the axis of perspectivity (the three points are colinear).
Desargue's Theorem says that two triangles are in persective centrally if and only if they are in in persective axially.
Proof¶
We use this function to construct symbolic points in the Projective plane.
def sym_vector(name):
'''name should be a string.'''
v = vector(SR, [var(f'{name}x'), var(f'{name}y'), var(f'{name}z')])
return v
Here we construct lists of the points: $a = [a_0, a_1]$, $b=[b_0, b_1]$, and $c=[c_0, c_1]$.
a = [sym_vector(f'a{i}') for i in range(2)]
b = [sym_vector(f'b{i}') for i in range(2)]
c = [sym_vector(f'c{i}') for i in range(2)]
a + b + c
[(a0x, a0y, a0z), (a1x, a1y, a1z), (b0x, b0y, b0z), (b1x, b1y, b1z), (c0x, c0y, c0z), (c1x, c1y, c1z)]
Below we construct normal vectors associated to the lines $\overline{a_0 a_1}$, $\overline{b_0 b_1}$, and $\overline{c_0 c_1}$.
na = a[0].cross_product(a[1])
nb = b[0].cross_product(b[1])
nc = c[0].cross_product(c[1])
na,nb,nc
((-a0z*a1y + a0y*a1z, a0z*a1x - a0x*a1z, -a0y*a1x + a0x*a1y), (-b0z*b1y + b0y*b1z, b0z*b1x - b0x*b1z, -b0y*b1x + b0x*b1y), (-c0z*c1y + c0y*c1z, c0z*c1x - c0x*c1z, -c0y*c1x + c0x*c1y))
The lines $\overline{a_0 a_1}$, $\overline{b_0 b_1}$, and $\overline{c_0 c_1}$ are coincident if and only if there is a non-zero vector whose dot product is zero with each vector above. Equivalently, we can arrange the vectors above into a matrix, and the lines are coincident if and only if the determinant is zero. Below we compute this determinant.
A = matrix([na,nb,nc])
A_det = A.det().expand().factor()
A_det
a0z*a1y*b0z*b1x*c0y*c1x - a0y*a1z*b0z*b1x*c0y*c1x - a0z*a1x*b0z*b1y*c0y*c1x + a0x*a1z*b0z*b1y*c0y*c1x - a0z*a1y*b0x*b1z*c0y*c1x + a0y*a1z*b0x*b1z*c0y*c1x + a0z*a1x*b0y*b1z*c0y*c1x - a0x*a1z*b0y*b1z*c0y*c1x - a0z*a1y*b0y*b1x*c0z*c1x + a0y*a1z*b0y*b1x*c0z*c1x + a0z*a1y*b0x*b1y*c0z*c1x - a0y*a1z*b0x*b1y*c0z*c1x + a0y*a1x*b0z*b1y*c0z*c1x - a0x*a1y*b0z*b1y*c0z*c1x - a0y*a1x*b0y*b1z*c0z*c1x + a0x*a1y*b0y*b1z*c0z*c1x - a0z*a1y*b0z*b1x*c0x*c1y + a0y*a1z*b0z*b1x*c0x*c1y + a0z*a1x*b0z*b1y*c0x*c1y - a0x*a1z*b0z*b1y*c0x*c1y + a0z*a1y*b0x*b1z*c0x*c1y - a0y*a1z*b0x*b1z*c0x*c1y - a0z*a1x*b0y*b1z*c0x*c1y + a0x*a1z*b0y*b1z*c0x*c1y + a0z*a1x*b0y*b1x*c0z*c1y - a0x*a1z*b0y*b1x*c0z*c1y - a0y*a1x*b0z*b1x*c0z*c1y + a0x*a1y*b0z*b1x*c0z*c1y - a0z*a1x*b0x*b1y*c0z*c1y + a0x*a1z*b0x*b1y*c0z*c1y + a0y*a1x*b0x*b1z*c0z*c1y - a0x*a1y*b0x*b1z*c0z*c1y + a0z*a1y*b0y*b1x*c0x*c1z - a0y*a1z*b0y*b1x*c0x*c1z - a0z*a1y*b0x*b1y*c0x*c1z + a0y*a1z*b0x*b1y*c0x*c1z - a0y*a1x*b0z*b1y*c0x*c1z + a0x*a1y*b0z*b1y*c0x*c1z + a0y*a1x*b0y*b1z*c0x*c1z - a0x*a1y*b0y*b1z*c0x*c1z - a0z*a1x*b0y*b1x*c0y*c1z + a0x*a1z*b0y*b1x*c0y*c1z + a0y*a1x*b0z*b1x*c0y*c1z - a0x*a1y*b0z*b1x*c0y*c1z + a0z*a1x*b0x*b1y*c0y*c1z - a0x*a1z*b0x*b1y*c0y*c1z - a0y*a1x*b0x*b1z*c0y*c1z + a0x*a1y*b0x*b1z*c0y*c1z
From the above we see that the triangles are in perspective centrally if and only if A_det == 0
.
Now we constuct the three points $p_c = \overline{a_0 b_0} \cap \overline{a_1 b_1}$, $p_a = \overline{b_0 c_0} \cap \overline{b_1 c_1}$, and $p_b = \overline{c_0 a_0} \cap \overline{c_1 a_1}$
pc = a[0].cross_product(b[0]).cross_product(a[1].cross_product(b[1]))
pa = b[0].cross_product(c[0]).cross_product(b[1].cross_product(c[1]))
pb = c[0].cross_product(a[0]).cross_product(c[1].cross_product(a[1]))
These points are colinear if and only if they are not linearly independent. Letting $B$ be the matrix whose rows are $p_a$, $p_b$ and $p_c$, we have that they are linearly independent if and only if the determinant of $B$ is non-zero. Thus the triangles are in perspective axially if and only if the determinant of $B$ is zero.
B = matrix([pa,pb,pc])
B_det = B.det().expand()
Observe that A_det
is a degree six polynomial. By printing B_det
we can see that it is degree 12. We guess that A_det
is a factor of B_det
and indeed this is the case:
ratio = (B_det/A_det).rational_simplify().factor()
ratio
(a0z*b0y*c0x - a0y*b0z*c0x - a0z*b0x*c0y + a0x*b0z*c0y + a0y*b0x*c0z - a0x*b0y*c0z)*(a1z*b1y*c1x - a1y*b1z*c1x - a1z*b1x*c1y + a1x*b1z*c1y + a1y*b1x*c1z - a1x*b1y*c1z)
What is the meaning of the factors? Well, there was a hypothesis that $\{a_0, b_0, c_0\}$ and $\{a_1, b_1, c_1\}$ are triangles. That they are triangles means that the triples can not be colinear, which means these two polynomials are non-zero:
p0 = matrix([a[0], b[0], c[0]]).det().expand()
p0
-a0z*b0y*c0x + a0y*b0z*c0x + a0z*b0x*c0y - a0x*b0z*c0y - a0y*b0x*c0z + a0x*b0y*c0z
p1 = matrix([a[1], b[1], c[1]]).det().expand()
p1
-a1z*b1y*c1x + a1y*b1z*c1x + a1z*b1x*c1y - a1x*b1z*c1y - a1y*b1x*c1z + a1x*b1y*c1z
Indeed, B_det/A_det
is p0*p1
:
ratio - p0*p1
0
We have B_det == A_det * p0 * p1$
. This means that the triples $\{a_0, b_0, c_0\}$ and $\{a_1, b_1, c_1\}$ are in perspective axially if and only if they are in perspective centrally, $\{a_0, b_0, c_0\}$ is colinear or $\{a_1, b_1, c_1\}$ is colinear. Since colinear triples were ruled out by hypothesis, Desargue's theorem is true.
3D graphics¶
References:
Plotting a function $f(x,y)$.¶
To make a somewhat interesting example, let us first define a triangle wave $g(t)$:
g(t) = 2*abs(t/2 - floor(t/2+1/2))
plot(g, (x, -3, 3), aspect_ratio=1)
f(x,y) = g(x^2+y^2)
show(f)
r = 1.5
plot3d(f, (x,-r,r), (y,-r,r))
It looks like there are some defects. These can be reduced by increasing the number of points used:
plot3d(f, (x,-r,r), (y,-r,r), plot_points=200)
The option plot_points
is one of several options to plot3d
. Here from the in-notebook documentation plot3d?
"adaptive" -- (default: False) whether to use adaptive refinement to draw the plot (slower, but may look better). This option does NOT work in conjunction with a transformation (see below).
"mesh" -- bool (default: False) whether to display mesh grid lines
"dots" -- bool (default: False) whether to display dots at mesh grid points
"plot_points" -- (default: "automatic") initial number of sample points in each direction; an integer or a pair of integers
The mesh
option is useful for seeing why the initial plot looked bumpy:
plot3d(f, (x,-r,r), (y,-r,r), mesh=True)
There are also different viewers.
The default viewer seems to be Three.js JavaScript WebGL Renderer. That link provides a lot of further options to customize your plot:
aspect_ratio
-- (default: [1,1,1]) list or tuple of three numeric valuesauto_scaling
-- (default: [False, False, False]) list or tuple of three booleans; set to True to automatically scale down the corresponding direction if it is too largeaxes
-- (default:False
) Boolean determining whether coordinate axes are drawnaxes_labels
-- (default: ['x','y','z']) list or tuple of three strings; set to False to remove all labelsaxes_labels_style
-- (default: None) list of three dicts, one per axis, or a single dict controlling all three axes; supports the same styling options as :func:~sage.plot.plot3d.shapes2.text3d
such ascolor
,opacity
,fontsize
,fontweight
,fontstyle
, andfontfamily
color
-- (default: 'blue') color of the 3D objectdecimals
-- (default: 2) integer determining decimals displayed in labelsdepth_write
-- (default:True
for opaque surfaces, False for transparent surfaces) whether to write the surface's depth into the depth buffer for the purpose of occluding objects behind itframe
-- (default:True
) Boolean determining whether frame is drawnonline
-- (default:False
) Boolean determining whether the local standard package files are replaced by links to an online content delivery networkopacity
-- (default: 1) numeric value for transparency of lines and surfacespage_title
-- (default: None) string containing the title of the generated HTML page; often displayed in the browser window's title bar, its tab list, and/or the operating system's task barprojection
-- (default: 'perspective') the type of camera projection to use; 'perspective' or 'orthographic'radius
-- (default: None) numeric value for radius of lines; use to render lines thicker than available usingthickness
or on Windows platforms wherethickness
is ignoredrender_order
-- (default: 0) numeric value for rendering order of transparent surfaces; objects render from lowest to highest value ensuring that lower-valued objects render completelysingle_side
-- (default:False
) Boolean determining whether both sides of a surface material are rendered; set to True to reduce rendering artifacts for closed transparent surfacestheme
-- (default: 'light') the color scheme to use for the scene and user interface; 'light' or 'dark'thickness
-- (default: 1) numeric value for thickness of linesviewpoint
-- (default: None) list or tuple of the form [[x,y,z],angle] setting the initial viewpoint of the scene, where angle is in degrees; can be determined using the 'Get Viewpoint' option of the information menu
In addition, there are some animation options.
plot3d(f, (x,-r,r), (y,-r,r), viewer='threejs',
plot_points=200, frame=False, opacity=0.9)
plot3d(f, (x,-r,r), (y,-r,r), viewer='threejs',
plot_points=200, frame=False, opacity=0.9)
The Tachyon 3D Ray Tracer is a non-interactive but can generate high quality images. Requires some playing with options to get a nice plot.
Here we demonstrate the non-interactive tachyon
viewer. It presumably looks a bit better on a smooth surface.
plot3d(f, (x,-r,r), (y,-r,r),
antialiasing=True, raydepth=3, opacity = 1,
camera_position = (3,0.6,1.5),
plot_points=300, color='purple',
frame=False, viewer='tachyon',
viewpoint=[[-0.3644,5,-0.8207],111.59])
Points in space¶
Lists of points can be dispayed in 3D with point3d
:
def get_random_points(n):
return [(random(), random(), random()) for i in range(n)]
plt = point3d(get_random_points(50), color='red', size=10) + point3d(get_random_points(50), color='blue', size=7)
plt
Spheres¶
A sphere can be constructed with a center and a radius:
sphere((0,0,0), 1)
A higher quality version of the plot above.
sum([sphere(center, 0.03) for center in get_random_points(50)]) + \
sum([sphere(center, 0.04, color='red') for center in get_random_points(50)])
Some other shapes are obtainable by running an import. For example Boxes.
from sage.plot.plot3d.shapes import Box
def randomly_positioned_box(**kwds):
return Box([0.03,0.06,0.12], **kwds).translate((random(),random(),random()))
randomly_positioned_box(color='red') + randomly_positioned_box(color='blue') + randomly_positioned_box(color='yellow')
Block stacking¶
Say a block has width one. How far can we push slide it off the table until it falls? Position the table so that the surface has negative $x$-coordinate with the edge at $x=0$.
var('x')
block_shift_off_table = x
center_of_mass = x - 1/2
eq = center_of_mass == 0
solve(eq, x)
[x == (1/2)]
Now suppose we have two blocks. The top one can be pushed halfway off the bottom block. How far can we shift the bottom block before the stack falls?
var('x')
bottom_block = x
top_block = bottom_block+1/2
# Set the sum of the center of masses equal to zero
eq = (top_block-1/2)+(bottom_block-1/2) == 0
solve(eq, x)
[x == (1/4)]
Now if we have three blocks:
var('x')
bottom_block = x
middle_block = bottom_block+1/4
top_block = middle_block + 1/2
# We compute the center of mass of the block stack and set it equal to zero
eq = (top_block-1/2)+(middle_block-1/2)+(bottom_block-1/2) == 0
solve(eq, x)
[x == (1/6)]
from sage.plot.plot3d.shapes import Box
def block_stack(n, dims=[8, 4, 1] ):
dims = 1/2*vector(QQ, dims)
def box(i):
if i%2 == 0:
return Box(list(dims), color='yellow').translate([-dims[0], 0, dims[2]])
else:
return Box(list(dims), color='blue').translate([-dims[0], 0, dims[2]])
plt = Box([dims[0], dims[0], 1], color='brown').translate([-dims[0], 0, -1])
delta_x = 0
shifts = []
for i in range(n):
shifts.append( 1/(2*(n-i)))
delta_x += 2*dims[0]/(2*(n-i))
delta_z = i*2*dims[2]
b = box(i).translate((delta_x, 0, delta_z))
if plt is None:
plt = b
else:
plt += b
show(shifts)
return plt
block_stack(3)
@interact
def interactive_block_stack(n='1'):
n = ZZ(n)
show(block_stack(n), frame=False)
Interactive function <function interactive_block_stack at 0x7f3a586c0f40> with 1 widget n: Text(value='1', description='n')
Wikipedia has an article on the block stacking problem.
Lines in space¶
def random_walk(n):
directions = [
vector([ 1, 0, 0]),
vector([-1, 0, 0]),
vector([ 0, 1, 0]),
vector([ 0,-1, 0]),
vector([ 0, 0, 1]),
vector([ 0, 0,-1]),
]
p = vector([0,0,0])
l = [p]
for i in range(n):
j = floor(6*random())
p += directions[j]
l.append(p)
return l
line3d(random_walk(100))
line3d(random_walk(100), thickness=5)
The radius option makes lines appear as cylinders.
line3d(random_walk(100), radius=1/10)
Parametric curves¶
p(t) = (t*cos(t), t*sin(t), t)
parametric_plot3d(p, (t, 0,20))
parametric_plot3d(p, (t, 0,20), thickness=5, color='green')
Parametric surfaces¶
We plot a helicoid, which in cylinderical coordinates is given by $\theta=z$ (or here $\theta=6z$)
f(r, z) = (r*cos(6*z), r*sin(6*z), z)
parametric_plot3d(f, (r, -1, 1), (z, 0, 2*pi/3))
f(r, z) = (r*cos(6*z), r*sin(6*z), z)
def c(r,z):
return abs(sin(6*z))
cm = colormaps.rainbow
parametric_plot3d(f, (r, -1, 1), (z, 0, pi/2), color=(c,cm), plot_points=200)
There are more examples in the Sage documentation for parametric_plot_3d of the use of color maps.
Least squares in 2D:¶
Consider the function $f:\mathbb R^2 \to \mathbb R$ given by $$f(x,y) = \cos(x) \cos(y).$$ Find the best approximation $g(x,y)$ from the span of the collection of functions: $$\{1, x, y, x^2, y^2, xy, \sin(x), \sin(y)\}$$ in the sense that it minimizes the integral: $$\iint_S \big(f(x,y)-g(x,y)\big)^2~dx~dy$$ over the square $S=[-r,r] \times [-r, r]$ with $r=\frac{3}{2}$.
Our function $f$:
f(x,y) = cos(x)*cos(y)
r = 3/2
plt1 = plot3d(f, (x, -r, r), (y, -r, r))
plt1
We have eight functions so we need eight coefficients to define a general $g$.
c = [var(f'c{i}') for i in range(8)]
Here are our eight functions (as Sage expressions):
var('x y')
basis = [1, x, y, x^2, y^2, x*y, cos(x), cos(y)]
show(basis)
Here we write the general linear combination of the functions above.
g(x,y) = sum([c[i]*basis[i] for i in range(8)])
g
(x, y) |--> c3*x^2 + c5*x*y + c4*y^2 + c1*x + c2*y + c6*cos(x) + c7*cos(y) + c0
We set $I$ equal to the integral: $$I = \iint_S \big(f(x,y)-g(x,y)\big)^2~dx~dy.$$
r = 3/2
I = ((f-g)^2).integral(x, -r, r).integral(y, -r, r)
I.expand()
-4*c6*c7*cos(3) + 36*c3*c6*cos(3/2) + 36*c4*c7*cos(3/2) + 3/2*c6^2*sin(3) + 3/2*c7^2*sin(3) + 12*c0*c6*sin(3/2) + 3*c3*c6*sin(3/2) + 9*c4*c6*sin(3/2) + 12*c0*c7*sin(3/2) + 9*c3*c7*sin(3/2) + 3*c4*c7*sin(3/2) - 24*c4*cos(3/2)*sin(3/2) - 2*c7*sin(3)*sin(3/2) - 8*c0*sin(3/2)^2 - 2*c4*sin(3/2)^2 + 9*c0^2 + 27/4*c1^2 + 27/4*c2^2 + 27/2*c0*c3 + 729/80*c3^2 + 27/2*c0*c4 + 81/8*c3*c4 + 729/80*c4^2 + 81/16*c5^2 + 9/2*c6^2 + 4*c6*c7 + 9/2*c7^2 + c6*cos(9/2) + c3*cos(3) - c6*cos(3/2) - 12*c3*sin(3) + 1/4*sin(3)^2 - 6*c6*sin(3/2) - 6*c7*sin(3/2) - c3 + 3/2*sin(3) + 9/4
This will be a polynomial of degree two in the coefficients. We will use this as an opportunity to explain why Least Squares works in cases like this. So, lets work out a linear-algebraic formula for the integral $I$. First we can arrange the quadratic coefficients of $I$ into a matrix as below. (We divide by two in the off diagonal entries- you will see why in a second.)
M = matrix([
[I.coefficient(c[i]*c[j]) if i==j else I.coefficient(c[i]*c[j])/2
for j in range(8)]
for i in range(8)])
show(M)
So that we can do linear algebra with $c$, let us make $c$ into a vector instead of a list:
c = vector(c)
Now the quadratic part of $I$ is given by $c \cdot (M c)$. This is why we needed to divide the off diagonal entries by two: When you expand this product you get two copies of $c_i c_j$ with $i \neq j$ and one copy of each $c_i^2$. So, to get the terms of $I$ with degree less than one, we can do:
I_1 = (I - c*(M*c)).expand() # Expand will force some simplification.
I_1
-24*c4*cos(3/2)*sin(3/2) - 2*c7*sin(3)*sin(3/2) - 8*c0*sin(3/2)^2 - 2*c4*sin(3/2)^2 + c6*cos(9/2) + c3*cos(3) - c6*cos(3/2) - 12*c3*sin(3) + 1/4*sin(3)^2 - 6*c6*sin(3/2) - 6*c7*sin(3/2) - c3 + 3/2*sin(3) + 9/4
The linear part can be represented as a dot product, say $v \cdot c$. Here we obtain $v$:
v = vector([I_1.coefficient(c[i]) for i in range(8)])
v
(-8*sin(3/2)^2, 0, 0, cos(3) - 12*sin(3) - 1, -24*cos(3/2)*sin(3/2) - 2*sin(3/2)^2, 0, cos(9/2) - cos(3/2) - 6*sin(3/2), -2*sin(3)*sin(3/2) - 6*sin(3/2))
The constant term should then be $I_1-v\cdot c$:
k = (I_1 - v*c).expand()
k
1/4*sin(3)^2 + 3/2*sin(3) + 9/4
So now it should be that $I$ is given by $k + v\cdot c + c \cdot (M c).$ We check that this is the case below:
bool( I == k + v*c + c*(M*c) )
True
Great now we can in theory do some linear algebraic simplifications. We will be computing the eigenvalues and eigenvectors of $M$. But unfortunately the entries of $M$ are in Symbolic Ring (SR
) and this is very slow for calculations. (Try computing M.eigenvectors_right()
and you will see what I mean.) If entries in M
were algebraic, I would probably use AA
, but I suspect that $\sin(3)$ is trancendental.
So we will convert our formula into numerical objects and then do the analysis. We must convert $k$, $v$ and $M$ into numerical objects. Why don't we use RDF
which uses doubles and would be very quick. If we want more precision we can always use a RealField
.
kn = RDF(k)
kn
2.4666587262585047
vn = vector(RDF, v)
vn
(-7.959969986401782, 0.0, 0.0, -3.6834325933188516, -3.6834325933188516, 0.0, -6.266502920722809, -6.266502920722809)
Mn = matrix(RDF, M)
show(Mn)
Now let us do the analysis. The matrix $M$ is a real symmetric matrix. Recall that such matrices are orthogonally diagonalizable. This means that you can write $M=P D P^{-1}$ with the columns of $P$ orthogonal unit vectors (they are orthonormal). To check that the columns of $P$ are orthonormal, you can check that $P*P^t=I$.
For some reason, the diagonalization
method only allows us to work over exact fields (and RDF
is not exact). We can build the matrix $P$ by hand using eigenvectors_right
. First maybe we check the eigenvalues:
Mn.eigenvalues()
[27.77883807443963, 4.087219458809409, 0.0006074720416440338, 0.0016032069266726213, 4.780091811962236, 6.75, 6.75, 5.0625]
So, it seems that there are $7$ distict eigenvalues, with $6.75$ repeated twice. This is a potential headache because we will need two eigenvectors with eigenvalue $6.75$ and there is no reason to assume that eigenvalues_right
will produce a pair of orthogonal eigenvectors with this common eigenvalue. (We could always do the Graham-Schmidt orthogonalization process.) Anyway, lets see what we get:
evs = Mn.eigenvectors_right()
print(len(evs))
evs
8
[(27.77883807443963, [(-0.5608040574870358, 0.0, 0.0, -0.4667897246012352, -0.4667897246012352, 0.0, -0.3533507571234874, -0.3533507571234871)], 1), (4.087219458809409, [(-0.25373688845903325, 0.0, 0.0, 0.5031397126997728, 0.5031397126997735, 0.0, -0.46331331215706933, -0.46331331215706917)], 1), (0.0006074720416440338, [(0.7881093836144863, 0.0, 0.0, -0.17016986369752268, -0.1701698636990467, 0.0, -0.40060456464806565, -0.4006045646516556)], 1), (0.0016032069266726213, [(-5.785868305212902e-14, 0.0, 0.0, 0.27639390621991333, -0.2763939062198885, 0.0, 0.6508505270832489, -0.6508505270831899)], 1), (4.780091811962236, [(-9.198496766060281e-17, 0.0, 0.0, -0.6508505270832193, 0.6508505270832197, 0.0, 0.2763939062199008, -0.2763939062199011)], 1), (6.75, [(0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0, 0.0)], 1), (6.75, [(0.0, 0.0, 1.0, 0.0, 0.0, 0.0, 0.0, 0.0)], 1), (5.0625, [(0.0, 0.0, 0.0, 0.0, 0.0, 1.0, 0.0, 0.0)], 1)]
The eigenvectors seems for some reason not to group the common eigenvalues together; we have $8$ lines above. We pull out the eigenvectors, normalize them and place them in a matrix as column vectors. (It is easy to place them in a matrix as row vectors, so we take a transpose.)
eigenvectors = [evs[i][1][0] for i in range(8)]
# Normalize the eigenvectors:
for i in range(8):
eigenvectors[i] /= eigenvectors[i].norm()
# Place them in a matrix as rows and take the transpose:
P = matrix([ eigenvectors[i] for i in range(8) ]).transpose()
show(P)
We should check that this $P$ really is orthonormal (especially because of the issue with the repeated exponent.) We check that $PP^T$ is the identity matrix. (Because we are working with doubles, we should not expect this to be exact.)
To do check that a matrix is very close to zero we can use the norm. The $1$ norm gives the largest sum of absolute values of a column of a matrix. (Check the documentation with M.norm?
.) In any case if this number is small than a matrix is very close to zero.
( P*P.transpose() - identity_matrix(8) ).norm(1)
3.923717124304088e-12
So all the entries of $P P^T$ are within $10^{-11}$ of the identity matrix. This is reasonable for numerical errors. We can safely accept that $P P^T$ is the identity.
Now, it should also be that $P^T M P$ is (nearly) diagonal. In fact, the diagonal entries are the corresponding eigenvalues. We can construct the diagonal matrix that should be equal to $P^T M P$ using diagonal_matrix
.
D = diagonal_matrix(Mn.eigenvalues())
( P.transpose() * Mn * P - D ).norm(1)
2.171600740900731e-14
Here we see that $P^T M P$ has entries within $10^{-13}$ of the diagonal matrix $D$. Great. We have $P^T M P = D$ up to numerical errors.
Now lets think a little bit about what we are doing. The matrix $P$ represents a change of coordinates that makes $M$ into a diagonal matrix. (Note $P^{-1} = P^T$, so we are conjugating $M$ by $P$.) In fact, because the columns of $P$ are orthonormal, multiplication by $P$ (or $P^{-1}$) preserves the lengths of vectors. Anyway, we can let $d = P^{-1} c$. This is just a coordinate change. But in $d$ coordinates the expression for $I$ looks nicer.
d = vector([var(f'd{i}') for i in range(8)]) # construct a $d$ vector with generic entries
d
(d0, d1, d2, d3, d4, d5, d6, d7)
We create a dictionary that converts from $d$ into $c$ coordinates:
c_in_d_coordinates = P*d
c_to_d_substituion = {c[i]: c_in_d_coordinates[i] for i in range(8)}
c_to_d_substituion
{c0: -0.5608040574870359*d0 - 0.25373688845903325*d1 + 0.7881093836144863*d2 - (5.785868305212904e-14)*d3 - (9.198496766060281e-17)*d4, c1: 1.0*d5, c2: 1.0*d6, c3: -0.4667897246012353*d0 + 0.5031397126997728*d1 - 0.17016986369752268*d2 + 0.2763939062199134*d3 - 0.6508505270832193*d4, c4: -0.4667897246012353*d0 + 0.5031397126997735*d1 - 0.1701698636990467*d2 - 0.2763939062198886*d3 + 0.6508505270832197*d4, c5: 1.0*d7, c6: -0.35335075712348746*d0 - 0.46331331215706933*d1 - 0.40060456464806565*d2 + 0.650850527083249*d3 + 0.2763939062199008*d4, c7: -0.3533507571234872*d0 - 0.46331331215706917*d1 - 0.4006045646516556*d2 - 0.65085052708319*d3 - 0.2763939062199011*d4}
Numerical version of the integral the product $c \cdot (Mc)$, expressed in $d$ coordinates:
quadratic_terms_in_d_coordinates = ( c*(Mn*c) ) .subs(c_to_d_substituion).expand()
quadratic_terms_in_d_coordinates
27.778838074439648*d0^2 - (2.6645352591003757e-15)*d0*d1 + 4.087219458809407*d1^2 + (3.1086244689504383e-15)*d0*d2 - (1.2212453270876722e-15)*d1*d2 + 0.0006074720416468329*d2^2 + (1.7763568394002505e-15)*d0*d3 - (9.992007221626409e-16)*d1*d3 + (8.743046176256295e-15)*d2*d3 + 0.0016032069266739446*d3^2 + (1.7763568394002505e-15)*d0*d4 + (7.993605777301127e-15)*d1*d4 + (3.6637359812630166e-15)*d2*d4 + (2.1094237467878195e-15)*d3*d4 + 4.7800918119622375*d4^2 + 6.75*d5^2 + 6.75*d6^2 + 5.0625*d7^2
Observe that this is the same thing as $d \cdot (D d)$ up to numerical error. This follows because $c = Pd$ so, $$c \cdot (M c) = c^T M c = (P d)^T M (P d) = d^T P^T M P d = d^T D d = d \cdot (D d).$$ But you can also observe that all the coefficients in the difference below are miniscule:
quadratic_terms_in_d_coordinates2 = d*(D*d)
quadratic_terms_in_d_coordinates - quadratic_terms_in_d_coordinates2
(1.7763568394002505e-14)*d0^2 - (2.6645352591003757e-15)*d0*d1 - (2.6645352591003757e-15)*d1^2 + (3.1086244689504383e-15)*d0*d2 - (1.2212453270876722e-15)*d1*d2 + (2.7990847487058268e-15)*d2^2 + (1.7763568394002505e-15)*d0*d3 - (9.992007221626409e-16)*d1*d3 + (8.743046176256295e-15)*d2*d3 + (1.3233771717358067e-15)*d3^2 + (1.7763568394002505e-15)*d0*d4 + (7.993605777301127e-15)*d1*d4 + (3.6637359812630166e-15)*d2*d4 + (2.1094237467878195e-15)*d3*d4 + (1.7763568394002505e-15)*d4^2
Now let $\lambda_-$ denote the minimal eigenvalue. Observe that $\lambda_-$ is positive:
lambda_minus = min(Mn.eigenvalues())
lambda_minus
0.0006074720416440338
Uniqueness of the critical point
Now we see that for each $i$, the $i$-th entry $(Dd)_i \geq \lambda_- d_i.$. Then we have $$d \cdot (D_d) = \sum_i d_i (D d)_i \geq \sum_i \lambda_- d_i^2 = \lambda_- \|d\|^2,$$ where $\|d\|$ denotes the Euclidean norm of $d$.
Since the matrix $P$ is orthonormal and $c = P d$, we know that $\|c\|=\|d\|$. It then follows that $$c \cdot (M c) = d \cdot (D d) \geq \lambda_- \|d\|^2 = \lambda_- \|c\|^2.$$ In particular, the magnitude of this term grows quadratically in the Euclidean length of $c$.
Now recall our expression for the integral $I$: $$I = k + v \cdot c + c \cdot (M c).$$ Recall that $k$ is a constant. Observe that $v \cdot c$ is bounded by $\|v\| \|c\|$, i.e., it grows most linearly in the length of $c$. So for large values of $\|c\|$, the quadratic term dominates. We conclude that $$\lim_{\|c\| \to +\infty} I = +\infty.$$ This means that $I$, viewed as a function of $c$, must have a global minimum. Furthermore, because $I$ is differentiable, any local minimum must be at a critical point. In fact, because $I$ is a quadratic polynomial, the derivative is affine linear. Consequently, the collection of critical points is an affine linear subset of $\mathbb R^8$ (the space containing $c$). We claim this subset is a single point. If it was not, then it would have dimension one or more, which would mean it tends to infinity in $\mathbb R^8$. But the function $I$ would be constant on this subspace, which is contradicts our limit formula above. So, indeed the set of critical points is a single point.
Computation of the critical point
We will explain how to compute the critical point from our formula: $$I(c) = k + v \cdot c + c \cdot (M c).$$ The gradient of this expression can be seen to be $$\nabla I(c) = v + M c + M^T c = v + 2Mc,$$ where we are using that $M=M^T$. Solving for $\nabla I = 0$, we get that the critical point $c_0$ is given by $$c_0 = \frac{-1}{2} M^{-1} v.$$
A numerical solution is then given by c_0
below:
c_0 = -1/2*Mn.inverse()*vn
c_0
(-0.44222055480031486, 0.0, 0.0, 8.526512829121202e-14, 1.1368683772161603e-13, 0.0, 0.6649966577364239, 0.6649966577365376)
We will now plug these values into our general linear combination $g$ obtaining our solution $g_0$:
g
(x, y) |--> c3*x^2 + c5*x*y + c4*y^2 + c1*x + c2*y + c6*cos(x) + c7*cos(y) + c0
c_0_substitution = {c[i]: c_0[i] for i in range(8)}
c_0_substitution
{c0: -0.44222055480031486, c1: 0.0, c2: 0.0, c3: 8.526512829121202e-14, c4: 1.1368683772161603e-13, c5: 0.0, c6: 0.6649966577364239, c7: 0.6649966577365376}
g_0 = g.subs(c_0_substitution)
g_0
(x, y) |--> (8.526512829121202e-14)*x^2 + (1.1368683772161603e-13)*y^2 + 0.6649966577364239*cos(x) + 0.6649966577365376*cos(y) - 0.44222055480031486
Now we plot both to make sure they make sense:
plt2 = plot3d(g_0, (x, -r, r), (y, -r, r), color='red')
plt1 + plt2
General remarks:
In general, we can always express the quantity in a least squares problem as $$I(c) = k + v \cdot c + c \cdot (M c),$$ where $k$ is a constant, $v$ is a vector, and $M$ is a matrix as above. Then the solution will always be given by the formula $$c_0 = \frac{-1}{2} M^{-1} v.$$
To see that $M$ is invertible, it suffices to know that all the eigenvalues of $M$ are positive. This is also necessary in the above argument, which needs that the minimal eigenvalue $\lambda_-$ is positive.
It is fairly easy to see that no eigenvalue of $M$ can be negative. By construction, the quantity $I$ is non-negative. One can show that if you move in the direction of an eigenvector with negative eigenvalue, the quantity $c \cdot (M c)$ decreases quadratically. This quadratic decrease dominates any contribution from the linear term, so we would get a contradiction to $I \geq 0$.
To show that zero can not be an eigenvalue of $M$ is a bit more subtle. Suppose $w$ is an eigenvector for $M$ with eigenvalue zero. Consider plugging in $c=t w$ for $t \in \R$. We'd have $Mw =0$, so $$I(tw) = k + t (v \cdot w).$$ Since $I(tw) \geq 0$ for all $t$, we see that $v \cdot w=0$. Thus $I(tw)=k$ for all $t$. Now consider what this says about the integral. To fix notation, let $\{h_0(x,y), \ldots, h_{n-1}(x,y)\}$ be the collection of functions we are considering linear combinations of. Then consider the function $$\phi(x,y)= w_0 h_0(x,y) + \ldots + w_{n-1} h_{n-1}(x,y).$$ Using the integral from the problem (for notational purposes), we have for $t \in \mathbb R$ that $$k = I(t \phi) = \iint_S \big(f(x,y)-t \phi(x,y)\big)^2~dx~dy.$$ Consider the values $t=-1$ and $t=1$. We see: $$k = \iint_S \big(f(x,y) + \phi(x,y)\big)^2~dx~dy=\iint_S \big(f(x,y) - \phi(x,y)\big)^2~dx~dy=$$ Taking the average of these two expressions, we obtain: $$k = \iint_S f(x,y)^2 + \phi(x,y)^2~dx~dy.$$ Now taking the value $t=0$, we see $k = \iint_S f(x,y)^2~dx~dy$. Taking the difference, we see that $$0 = \iint_S \phi(x,y)^2~dx~dy.$$ Then assuming all our functions $h_0, \ldots, h_{n-1}$ are continuous, so is $\phi$. It then follows from the above that $\phi \equiv 0$ on $S$.
Consequently we have shown that the only way an eigenvalue of zero can occur is if in our space of continuous functions $$span~\{h_0, \ldots, h_{n-1}\}$$ there is a function which is identically zero on the region $S$ we are integrating over. In that case, the functions above should not be considered to be linearly independent. We have proved:
Theorem. In a least squared problem where we minimize an integral of the form $$I = \int_S (f-g)^2,$$ over $g$ in the span of continuous functions which are linearly independent as functions restricted to $S$, the minimum $g_0$ is unique and given by the unique critical point of $I$ viewed as a function of the coefficients of the linear combination.